home *** CD-ROM | disk | FTP | other *** search
Text File | 1989-09-21 | 3.8 KB | 126 lines | [TEXT/EDIT] |
- Finding Duplicates
- ------------------
-
- This Stuffit document should contain:
-
- FileTree 4.97 - a shareware disk cataloging utility by Jody S. Kravitz
-
- Plus the following 4 freeware utilities by Mark J. Smith:
-
- FormatTree 0.1 - utility to reorganize FileTree output
- SortTree 0.1 - utility to sort reformatted FileTree output
- FindExact 0.1 - utility to find "exact" matches
- FindOthers 0.1 - utility to find other "suspicious" matches
-
- Here is a brief step-by-step guide to finding duplicate files using
- the above utilities.
-
- 1. Launch FileTree
- 2. Using the File menu, create an output file
- 3. Using the Options menu, configure FileTree to report only
-
- (a) Total File Size
- (b) Full Path Names
-
- 4. Select a volume to catalog.
-
- 5. Launch FormatTree and reformat the FileTree output.
- 6. Launch SortTree (requires 1.3 MB) and sort the reformatted output.
- 7. Launch FindExact to search the sorted output for duplicates.
- 8. Launch FindOthers to search for additional duplicates.
-
- Note: you can use another program or utility to sort the reformatted
- output (esp. if memory requirements are a problem) but you will first
- need to open the reformatted file and remove the first 5 lines and
- last 3 lines of text. SortTree does this for you automatically.
-
- A few words about each of the freeware utilities:
-
- FormatTree 0.1
- --------------
-
- This utility reformats output generated by the FileTree program.
-
- The output must contain only 2 columns of information:
-
- (1) the file size in the 1st column
- (2) the full pathname in the 2nd column
-
- FormatTree will split this information into 3 columns as follows:
-
- (1) the filename in the 1st column
- (2) the file size in the 2nd column
- (3) the folder pathname in the 3rd column
-
-
- SortTree 0.1
- ------------
-
- This utility sorts the output generated by the FormatTree program.
-
- SortTree ignores the first 5 lines and last 3 lines of the input file.
- Otherwise, SortTree is a general purpose Quicksort program that can
- be used to sort any text file containing less than 12,000 lines.
-
- If you use another program to sort the output from FormatTree, you
- need to remove the first 5 and last 3 lines manually before sorting.
-
- SortTree requires 1.3 MB's of RAM under both Finder and Multifinder.
-
-
- FindExact 0.1
- -------------
-
- This utility searches for exact matches between pairs of adjacent
- filenames. For this reason, input into this program must first be
- sorted into alphabetical order.
-
- FindExact is unique in that it:
-
- (1) is case insensitive
- (2) strips leading, trailing and embedded spaces
- (3) strips underscore characters
- (4) strips filename extensions
-
- FindExact will find "My File.pit", "my_file.sit" and "MyFile.01" as
- exact duplicates.
-
-
- FindOther 0.1
- -------------
-
- This utility searches for high probability matches between pairs of
- adjacent file names. For this reason, input into this program must
- first be sorted into alphabetical order.
-
- FindOther finds and discards matches detected by FindExact. It then
- searches for file names which have 75% or more characters in common.
-
- FindOther can find duplicates like:
-
- 'Animation Stack' and 'AnimationStak.sit'
- 'DeskPict.sit', 'DeskPict1.0' and 'DeskPict_1.1.sit'
- 'GateKeeper111.sit' and 'Gate_Keeper_1.1.sit'
-
- Note: FindOther will report many more non-duplicates than duplicates.
- However it reduces the search space (for you the user) to more
- manageable proportions by reporting only suspect cases (those with a
- high probability of being duplicates). It's utility lies in it's
- ability to identify cases like those illustrated above.
-
- For further information or source code, please contact Mark J. Smith
- at one of the following locations:
-
- GEnie: MJMS
- BIX: MJMS
-
- MAC-LINK BBS: 514-935-4257 (sysop)
-
- DMI Systems
- 1028 Greene Ave.
- Montreal, QC H3Z 1Z7
- CANADA
-
- End of ReadMe.
-
-